Learning an English - Chinese Lexiconfrom a Parallel
نویسندگان
چکیده
We report experiments on automatic learning of an English-Chinese translation lexicon, through statistical training on a large parallel corpus. The learned vocabulary size is non-trivial at 6,517 English words averaging 2.33 Chinese translations per entry, with a manually-ltered precision of 95.1% and a single-most-probable precision of 91.2%. We then introduce a signiicance ltering method that is fully automatic, yet still yields a weighted precision of 86.0%. Learning of translations is adaptive to the domain. To our knowledge, these are the rst empirical results of the kind between an Indo-European and non-Indo-European language for any signiicant corpus size with a non-toy vocabulary.
منابع مشابه
Exploring Parallel Concordancing in English and Chinese
This paper investigates the value of computer technology as a medium for the delivery of parallel texts in English and Chinese for language learning. An English-Chinese parallel corpus was created for use in parallel concordancing -a technique which has been developed to respond to the desire to study language in its natural contexts of use. Specific problems of dealing with Chinese characters ...
متن کاملLearning an English-chinese Lexicon from a Parallel Corpus
We report experiments on automatic learning of an English-Chinese translation lexicon, through statistical training on a large parallel corpus. The learned vocabulary size is nontrivial at 6,517 English words averaging 2.33 Chinese translations per entry, with a manuallyfiltered precision of 95.1% and a single-most-probable precision of 91.2%. We then introduce a significance filtering method t...
متن کاملA Language Learning System with Automatic Feedback: An Application Based on a English-Chinese Parallel Corpus
The learners of English as a second language generally need many practices on writing English, identification of the mistakes in their English, and feedback hints on how to correct their mistakes. A computer-assisted online system is designed to address these issues in a context of learning from a corpus of parallel English-Chinese corpus of New York Times news articles. In the system, students...
متن کاملContrastive connectors in English and Chinese: A case
This comparative study of however and its Chinese counterparts in two translation corpora (the HLM parallel corpus, and the Babel English-Chinese Parallel Corpus) reveals that the Chinese contrastive relations tend to be expressed implicitly (cf. Wang and Zheng 2004) and Chinese contrastive connectors are generally used in sentence initial position, whereas the English contrastive relations ten...
متن کاملBilingual Parallel Active Learning Between Chinese and English
Active learning is an effective machine learning paradigm which can significantly reduce the amount of labor for manually annotating NLP corpora while achieving competitive perfor-mance. Previous studies on active learning are focused on corpora in one single language or two languages translated from each other. This paper proposes a Bilingual Parallel Active Learning paradigm (BPAL), where an ...
متن کامل